PSCI 2270 - Week 4
Department of Political Science, Vanderbilt University
September 21, 2023
Question: What data can we collect to study factors that affect election participation?
Applying CLT/LLN for estimation
Logic of causal inference
Probability:
Law of Large Numbers
Central Limit Theorem:
In real data, we will have a set of \(n\) measurements on a variable: \(X_1\) , \(X_2\), … , \(X_n\)
Empirical analyses: sums or means of these \(n\) measurements
Law of Large Numbers (LLN)
Let \(X_1\) , … , \(X_n\) be independent and identically distributed random variables with mean \(\mu\) and finite variance \(\sigma^2\). Then, \(\bar{X}\) converges to \(\mu\) as \(n\) gets large.
Intuition: The probability of \(\bar{X}\) being “far away” from \(\mu\) goes to \(0\) as \(n\) gets big
The distribution of sample mean “collapses” to population mean
The normal distribution is the classic “bell-shaped” curve.
Three key properties:
Central Limit Theorem (CLT)
Let \(X_1\) , … , \(X_n\) be independent and identically distributed random variables with mean \(\mu\) and variance \(\sigma^2\). Then, \(\bar{X}_n\) will be approximately distributed \(N ( \mu, \sigma^2 / n )\) in large samples.
Approximation is better as \(n\) goes up \(\Rightarrow\) asymptotics
“Sample means tend to be normally distributed as samples get large.”
By CLT, sample mean \(\approx\) normal with mean \(\mu\) and sd of \(\sigma^2 / n\)
By empirical rule, sample mean will be within \(2 \times \sigma^2 / n\) of the population mean 95% of the time
We usually only 1 sample, so we’ll only get 1 sample mean. So why do we care about LLN/CLT?
\[ SE = \sqrt{\frac{\sigma^2}{n}} = \frac{\sigma}{\sqrt{n}} \]
Latest Gallup poll:
Our focus: simple random sample of size \(n\) from some population \(Y_1\) , … , \(Y_n\)
Point estimation: providing a single “best guess” as to the value of some fixed, unknown quantity of interest, \(\theta\) (read theta)
Examples of quantities of interest ( estimands ):
Estimator
An estimator, \(\hat{\theta}\), of some parameter \(\theta\), is some function of the sample: \(\hat{\theta} = h(Y_1 , ... , Y_n )\).
An estimate is one particular realization of the estimator
There are many (\(\infty\) ?) different possible estimators:
Assume a simple random sample of n voters: \(n = 1014\)
Define random variable \(Y_i\) for Biden’s approval:
\(Y_i\) has probability of success \(p\)
\[ \bar{Y} = \frac{1}{n} \sum_{i = 1}^{n} Y_i = \frac{\text{number who support Biden}}{\text{n}} \]
\(\theta\) \(= p\)
\(\hat\theta\) \(= \bar{Y}\)
\[ \underbrace{\text{sample mean}}_{\bar{Y}} = \underbrace{\text{population mean}}_{p} + \text{chance error} \]
Remember: the sample mean is a random variable
Expectation: average of the estimates across repeated samples
\[\sqrt{\mathrm{Var}(\bar{Y})} = \sqrt{\frac{p(1 − p)}{n}}\]
\[\sqrt{\widehat{\mathrm{Var}}(\bar{Y})} = \sqrt{\frac{\bar{Y}(1 − \bar{Y})}{n}} \class{fragment}{= \sqrt{\frac{0.42 (1 − 0.42)}{1014}} \approx 0.016}\]
\[ \bar{Y} − p = \text{chance error}\]
How can we figure out a range of plausible chance errors?
\[\bar{Y} \sim N \left( \underbrace{\mathbb{E}[Y_i]}_{p}, \underbrace{\frac{\mathrm{Var}(Y_i)}{n}}_{\frac{p(1-p)}{n}} \right)\]
First, choose a confidence level.
\(100 \times (1 − \alpha)\) % confidence interval: \(CI = Y ± z_{\alpha/2} \times SE\)
Question: What data can we collect to study factors that affect election participation?
Applying CLT/LLN for estimation
This is important if all relevant outcomes within sample are observed
But, for causal (“what-if”) question we cannot observe all relevant outcomes within sample \(\Rightarrow\) internal validity
Does the minimum wage increase the unemployment rate?
Does having a daughter affect a judge’s rulings in court?
Fundamental problem of causal inference
Question: Does having a female as a head of a village council increase share of budget allocated to water sanitation?
Setting: 8 randomly sampled villages in Indonesia (some with female and some with male head)
Outcome: Share of budget each village spends on water sanitation
| Village | Head of Council | Budget Share |
|---|---|---|
| Village 1 | Female | 15% |
| Village 2 | Male | 10% |
Treatment (\(T_i = 1\)) group: Villages with female head of council
Control (\(T_i = 0\)) group: Villages with male head of council
| Village | \(T_i\) (Head of Council) | \(Y_i\) (Budget Share) |
|---|---|---|
| Village 1 | 1 | 15 |
| Village 2 | 0 | 10 |
What does “\(T_i\) causes \(Y_i\)” mean?
Imagine two states of the world: one in which you receive some treatment and another in which you do not \(\Rightarrow\) potential outcomes
(Individual) Treatment effect: \(Y_i (1) − Y_i (0)\)
Average Treatment Effect (ATE):
\[ \frac{1}{n} \sum_{i = 1}^{n} Y_i (1) − \frac{1}{n} \sum_{i = 1}^{n} Y_i (0) = \frac{1}{n} \sum_{i = 1}^{n} \left[ Y_i (1) − Y_i (0) \right] \]
| Village | \(T_i\) (Head of Council) | \(Y_i\) (Budget Share) | \(Y_i (0)\) (Budget Share if Male Head) | \(Y_i (1)\) (Budget Share if Female Head) |
|---|---|---|---|---|
| Village 1 | 1 | 15 | ??? 10 16 14 | 15 |
| Village 2 | 0 | 10 | 10 | ??? 12 7 9 |
Fundamental problem of causal inference:
Observe \(Y_i = Y_i (1)\) if \(T_i = 1\) or \(Y_i = Y_i (0)\) if \(T_i = 0\)
Find a similar unit! \(\Rightarrow\) matching
Did village spend more on water sanitation because of female council head?
NJ increased the minimum wage. Causal effect on unemployment?
The problem: imperfect matches!
Say we match villages \(i\) (treated) and \(j\) (control)
Selection Bias: \(Y_i (1) \neq Y_j (1)\) or \(Y_i (0) \neq Y_j (0)\)
Those who take treatment may be different that those who take control.
How can we correct for that?
RANDOMIZE! 😵💫